good performance
FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations
Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We show significant improvements using these augmentations. Further, we show that sharing the same weights across these different filter augmentations is possible, reducing the computational load. In addition, previous works have shown that good performance on downstream tasks requires high dimensional representations. Working with high dimensions increases the computations, especially when multiple augmentations are involved. We mitigate this problem and recover good performance through lower dimensional embeddings using simple random Fourier feature projections. Our method, FiGURe, achieves an average gain of up to 4.4\%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic.
Task-Robust Pre-Training for Worst-Case Downstream Adaptation
Pre-training has achieved remarkable success when transferred to downstream tasks. In machine learning, we care about not only the good performance of a model but also its behavior under reasonable shifts of condition. The same philosophy holds when pre-training a foundation model. However, the foundation model may not uniformly behave well for a series of related downstream tasks. This happens, for example, when conducting mask recovery regression where the recovery ability or the training instances diverge like pattern features are extracted dominantly on pre-training, but semantic features are also required on a downstream task.
OOD K. ฮฑ
Based on R1's comments we also evaluated the models based on mutual Theoretically, the two metrics bring similar information [C]. For these reasons, we decided to use APR. We attribute the strong performance of PostNet to the dim. Similar conclusions have been drawn in [E]. In our paper we use 5 random splits (60%, 20%, 20%).
We thank all of the reviewers for their valuable comments and suggestions
We thank all of the reviewers for their valuable comments and suggestions. We have replaced "Iterations" with "Time" in Figure 1 R2: W all-clock time comparison and # of potential functions being evaluated. Please refer to Figure 3. Additionally, R2: Are there problems on which vanilla Gibbs would be prohibitively expensive? R2: Are there problems on which Poisson-Gibbs might fail? Empirically, we did not find a poor initialization issue for Poisson-Gibbs.
M-flows
We thank the reviewers for their insightful feedback! Our goal is not to reduce the dimensionality further below n . What are the convergence properties of the proposed training method (R4)? Is the sequential or alternating training scheme better (R4)? It would be nice to have a different metric to compare the models (R1).